A Probabilistic Model for Stemmer Generation

نویسندگان

  • Michela Bacchin
  • Nicola Ferro
  • Massimo Melucci
چکیده

Today managing textual resources and providing full-text search capabilities on them is a relevant issue also for database management systems. Stemming is part of the indexing and searching processes, when we deal with textual resources. In this paper we present a languageindependent probabilistic model which can automatically generate stemmers for several different languages. The variety of word forms makes the match between the end user’s words and the document words impossible even if they refer to the same concept this mismatch degrades retrieval performance. Stemmers can improve the retrieval effectiveness, but the design and the implementation of stemmers requires a laborious amount of effort. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation of it. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Modeling of Power Generation System of a Thermal Plant

The present paper discusses the development of a performance model of power generation system of a thermal plant for performance evaluation using Markov technique and probabilistic approach. The study covers two areas: development of a predictive model and evaluation of performance with the help of developed model. The present system of thermal plant under study consists of four subsystems with...

متن کامل

To stem or lemmatize a highly inflectional language in a probabilistic IR environment?

Effects of three different morphological methods-lemmatization, stemming and inflectional stem generation-for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four point relevance scale which is partitioned differently in different test settings. Results show that inflectional stem generation which has not been used much in IR, compares well with lemm...

متن کامل

Loss Reduction in a Probabilistic Approach for Optimal Planning of Renewable Resources

Clean and sustainable renewable energy technology is going to take responsibility of energy supply in electrical power systems. Using renewable sources improve the environment and reduce dependence on oil and other fossil fuels. In distribution power system, utilizing of wind and solar DGs comprises some advantages; consist of loss and emission reduction, and also improvement of voltage profile...

متن کامل

Stemming in Agglutinative Languages: A Probabilistic Stemmer for Turkish

In this paper, we introduce a new lexicon free, probabilistic stemmer used in a developing Turkish Information Retrieval system. It has a linear computational complexity and its test success ratio is 95.8%. The main contribution of this paper is to give a thorough description of a probabilistic perspective for stemming which can also be generalized to apply to other agglutinative languages like...

متن کامل

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004